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Linking  Social  Media  Reports  to  Network  Indicators  of  DoS  Attacks 

Evan  Wright  (CERT),  Rhiannon  Weaver  (CERT),  Alan  Ritter  (Ohio  State  University) 


Introduction  and  Background 

Social  awareness  of  denial  of  service  (DoS)  as  a  cybersecurity  threat  has  led  to  public  reporting  in 
fast-paced  social  media  such  as  Twitter,  but  these  reports  are  rarely  linked  to  quantifiable  network 
behavior.  A  data  set  of  network-based  vs.  media-reported  DoS  attacks  can  help  researchers  determine  the 
prevalence  of  DoS  tactics  such  as  IP  spoofing,  the  intensity  and  duration  that  leads  to  media  reporting,  and 
the  types  of  organizations  are  reported  or  under-reported.  We  used  heuristics  and  machine  learning 
methods  to  link  DoS-related  tweets  to  flow-based  evidence,  collected  from  a  large  private  network,  of  DoS 
activity  targeting  entities  extracted  from  those  tweets.  Preliminary  results  show  promise  for  novel  data 
visualization  and  future  refinement  for  formal  inference. 


Methods  and  Data  for  Multi-Step  Correlation 


Analysis  and  Results 


Preliminary  Evaluation  Data 

Random  sample  of  30  (D,  E)  pairs  yielding  21  unique 
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Tweets 

We  compiled  tweets  from  a  seed  event  table  as  well  as  those  with  the 
hashtag  #DDoS  (Ritter  et  al)  for  the  ranges  of  April  7  -  July  20  2014,  and 
October  21  -  November  9,  2014. 


Date 

Entity 

Tweets 

2014/06/11 

feedly 

26 

2014/06/11 

evernote 

36 

I  Entities 

We  employed  clustering  methods  and  natural  language  processing  (NLP)  to 
I  map  the  tweets  into  unique  date-entity  pairs. 

I  Result:  533  unique  date-entity  pairs:  (D,  E) 


Domains 

We  used  a  Google  API  to  return  the  top  three  domain  names  for  each  entity 
in  the  previous  step  (with  a  whitelist  of  Wikipedia,  youtube,  etc.). 

Result:  355  (D,  E)  pairs  mapped  to  at  least  one  domain  name:  (D,  E,  N) 
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IP  Addresses 

We  used  the  Security  Information  Exchange  (SIE)  passive  DNS  data 
(February-October  2014)  to  link  domains  from  the  previous  step  to  IP 
addresses  and  dates. 

Result:  345  (D,  E,  N)  tuples  mapped  to  at  least  one  IP  address:  (D,  E,  N,  I) 


Network  Activity 

We  used  SiLK  to  find  those  IP  addresses  with  evidence  of  backscatter  from 
random  spoofing:  TCP  SYN-ACK  flags  sent  to  >500  internal  machines  in  a 
large  private  network,  with  corresponding  RST  flags  sent  from  >500  internal 
machines  (see  Moore  et  al). 

Result:  178  (D,  E7  N,  I)  tuples  mapped  to  backscatter  flow  events:  (D,  E,N,!,F) 


Individual  Campaigns 
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Future  Work 

In  this  analytic,  there  are  many  potential  reasons  for  finding  or  failing  to  find  connections 
in  each  of  the  correlation  steps.  We  plan  to  conduct  a  detailed  error  analysis  to  learn 
these  reasons  and  to  understand  their  relative  likelihoods  for  formal  statistical  inference. 
As  suggested  by  the  preliminary  evaluation,  our  immediate  next  steps  will  focus  on 
improving  both  the  extraction  of  specific  targeted  entities  from  tweets,  as  well  as  the 
correlation  of  those  entities  to  domains. 
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